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Abstract 

In this paper we introduce a novel framework for the distributed control of DTNs. The mechanism proposed 
is meant to tackle a core problem of such systems: how to induce coordination of relays of a DTN in order to 
deliver messages from a source node to a destination in a non-cooperative fashion. 
, Devices acting as active relays, in fact, sacrifice part of their batteries in order to support message replication 

■ and thus increase the probability to reach the destination. In our scheme, we assume that relays choose among 

two strategies: either to participate to the message relaying, or to not to participate in order to save energy. 
We introduce a coordination mechanism using the notion of minority game. We study first the performance of 
DTNs where relays compete to be in the population minority. The scheme unveils new performance figures under 
competition of relays and defines a novel welfare of the DTN based on the number of active relays and delivery 
message probability. Using this tool the network operator can control the DTN operating point. To this respect, 
we characterize extensively the possible equilibria of this game. We further extend the analysis to heterogeneous 
multi-class DTNs. Finally, a stochastic learning algorithm is proposed which can provably drive the system to 
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the optimal solution. We provide extensive numerical results to validate the proposed scheme. 
O \ I. INTRODUCTION 

Delay Tolerant Networks (DTNs) are designed to cope with scarce coverage. Thus, the standard 
problem is how to maximize the delivery probability of a message under constraints on the resources 
O '. spent to forward it to destination. 
^ ■ To this respect, efficient routing was studied first. Aim is to avoid greedy solutions such as epidemic 
^ ■ routing where the success probability is maximized together with the number of message copies [23], 
! [10]. In an effort to optimize the network performance under various resources constraints, several 
O [ papers have further included the use of activation and/or forwarding control at relays [18], [15], [9]. 
^ ■ However, due to limited energy or memory capacity, not always relays can be active and participate 
... to message routing. For instance, owners of relay devices such as smartphones or tables may not be 
! willing to have battery depleted to sustain DTNs communications. From the forwarding standpoint, in 
^ I turn, massive de-activation of relays becomes a core threat. Under two hop routing, for instance, a linear 
■ decrease in the number of relays determines the exponential decay in the delivery probability. 

In this work we assume that the decision to participate to relaying or not, is taken autonomously by 
relay nodes according to an incentive scheme. Incentives engender a competition among relays that play 
strategies on their activation. The objective here is to attain an operating point for the DTN which is 
the solution of a joint optimization problem involving the number of active relays and the energy cost. 

The relay activation control in turn is fully decentralized and does not require additional control 
messages. In order to do so, we use a novel and specific utility structure. Such utility is rooted on the 
following trade off: the success of a tagged relay depends explicitly on the number of opponents met, 
namely, nodes adopting the same strategy. In fact, the bigger the number of relays participating to the 
message delivery, the higher the delivery probability for the message, but indeed the less the chance for 
the tagged relay to receive a reward from the system. The global activation target settles the number of 
opponents of a randomly tagged relay, i.e., the active fraction of the population. 

Overall, the work is pivoted around this new approach: by modeling competition of relay nodes as a 
coordination game we show that it is possible to enforce a behavior of cooperation within a population of 
relays through competition. In fact, we will rely on the theory of the Minority Game (MG) [17] which is 
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rooted on dynamical competition. MG does not require explicit coordination among the relay nodes: this 
makes it attractive because control messages among DTN nodes may experience unpredictable delays 
due to lack of persistent connectivity. In the last part of this paper, we investigate how to account 
for the presence of heterogeneous agents. The MG rules performance of competing relays and welfare 
of the DTN (number of message copies and delivery message) and thus configures as an appropriate 
tool to drive the network to a desired operating point. We thoroughly investigate the properties of our 
coordination game in which relays compete to be in the population minority. 

Finally, since the MG scheme rules the number of active relays, the message source can achieve a 
target performance figure, e.g., the probability of successful message delivery, by setting the rewarding 
mechanism appropriately. Conversely, the source can reduce the quality of service in order to reduce 
the relays energy consumption. Thus, our incentive mechanism can match quality of service metrics 
such as delivery probability to the available resources. 

Compared to existing literature, the novelty of this work stands in the way the activation and 
forwarding process is jointly controlled by the operator of the network acting on a distributed mechanism 
which takes place among competing relays based on the MG. We will specialize the new mechanism 
in two frameworks: the first one is the single-class model (namely, the homogeneous DTN case), the 
second takes into consideration the existence of several classes of nodes (namely, the heterogeneous 
DTN case). Finally, we provide an algorithmic formulation of the game and demonstrate that the solution 
of the MG can be attained under adaptation of each ones expectation about the future. 

A. Background and contribution 

The minority game studies how individuals of a population of heterogeneous agents may reach a form 
of coordination when sharing resources for which the utility decreases in the number of competitors. 
Upon introducing adaptation of strategies based on each one's expectation about the future, the game can 
describe a dynamical system with many interacting degrees of freedom where cooperation is implicitly 
induced among agents. The MG was first introduced in literature as a simplification of the El Farol 
Bar's attendance problem [17], [14]. In the El Farol bar problem [12] users decide independently 
whether to go to the unique bar in Santa Fe that offers entertainment. However, the bar is small, and 
they enjoy only if at most ^ of the possible attendees are present, in which case they obtain a reward 
r at a cost < c < r for going to the bar. Otherwise, they can stay home and watch stars with utility 
0. Players have two actions: go if they expect the attendance to be less than \l/ people or stay at home 
and watch stars if they expect the bar will be overcrowded. The original formulation as a single stage 
game. El Farol bar game has (^) pure Nash equilibria and a single symmetric mixed Nash equilibrium 
at zero utility where each player uses the value of p such that J2h=o {^h^)p'^ ~ 

The extension of the game introduces a learning component based on the belief of future attendance 
that every player has: the only information available is the number of people who came to El Farol in 
past weeks. In particular, [22] and its follow up [21] apply the concept of evolutionary MG to complex 
networks including random and scale-free networks. Authors of [16] apply MGs to cognitive radio 
networks for the design of MAC layers. 

All those works consider an odd number of interacting agents and do not suggest the exact analysis of 
equilibrium points as we suggest in this paper; a further key added value of our work is the application 
of a standard economic estimator, namely, the logit belief model, which provides a suitable convergence 
framework for our mechanism design. Finally, from the application standpoint, and to the best of our 
knowledge, it is the first time the concept of MG is applied to DTNs with the aim to derive a mechanism 
to induce coordination in a non-cooperative fashion. 

The remainder of this paper is organized as follows. In Sec. |ll] we introduce the system model and 
the notation used throughout the paper. Results for the equilibria of the MG are derived in in Sec. Hill 
The extension to the multiclass DTN case is provided Sec. HVl A distributed reinforcement learning 
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algorithm able to drive the system to the desired operating point is derived in Sec. |Vl In Sec. |VI] we 
study a particular case of application. Numerical results for validating the outcomes of the theoretical 
analysis are reported and discussed in Sec. IVIII Final remarks are reported in Sec. IVIIII 

II. Network model 

We consider a Delay Tolerant Network with several sources s.;, destinations di and a large number of 
mobiles acting as relay nodes in the system. Each mobile is equipped with a wireless interface allowing 
communication with other mobiles in their proximity. Messages are generated at the source nodes and 
need to be delivered to the destination nodes; however, each such message is relevant for a time interval 
of length r: this is also the horizon by which we intend to optimize network performance. 

The network is assumed to be sparse: at any time instant, nodes are isolated with high probabilityQ 
Nevertheless, due to mobility patterns, communication opportunities arise whenever two nodes get within 
mutual communication range, i.e., a "contact" occurs. The time between subsequent contacts between 
any two nodes is assumed to follow a random distribution. In a particular scenario in section |VIl we 
will consider the two hop routing scheme, in which any mobile that receives a copy of the packet from 
a given source can only forward it to its destination. 

Consider a message generated at t = 0: each source node attempts to deliver the message to its 
destination; it does so eventually with several copies spread between the relays nodes. Each such 
message contains a time stamp reporting its age and can be deleted when it becomes irrelevant, e.g., 
after time r. Due to lack of permanent connectivity, we exclude the use of feedback that allows the 
sources or other mobiles to know whether the message has been successfully delivered to its destination 
or not. For the same reason, the design of our activation mechanism should not require centralized 
coordination and any such scheme should indeed run fully distributed on board of the relay nodes. 

Remark 1: Since in DTNs the sources cannot predict neither the forwarding path nor the minority 
community nodes, some rewarding models assume for example that, the reward is distributed by the 
current intermediate nodes without the involvement of the source. This can be realized using, for 
example, the layered coin method proposed in [24]. 

A. Network Game 

In this section we detail the payoff structure of the proposed mechanism. When a message is generated 
by a source node, the competition takes place during the message lifetime, i.e., with duration r. Each 
mobile has two strategies: either to participate to forwarding, i.e., pure strategy transmit (T), or not 
to participate, i.e., pure strategy silent (S). Mixed strategies, i.e., probability distributions over the two 
possible actions, are also possible and will be described later on. 

Each strategy corresponds to a certain utility for the relay. Let's now detail how the minority game 
develops. First, let ^ > be the threshold fixed by some operator (e.g., the source nodes): it defines the 
majority /minority of nodes using the two policies. Hence, the utility of player is designed in such a way 
that, upon successful delivery of message to the destination, an active mobiles may receive a positive 
expected reward conditional to the fact the actives mobiles represent the minority and the mechanism 
selected by network operator. Other nodes receive in this case the opposite as a non-positive expected 
reward. The customary way to interpret this non-positive reward is that of a regret for abstention. 

Formally, let N be the total number of nodes involved in the competition. The probability that an 
active mobile relays the copy of the packet to the destination within time r is denoted by 1 — Qt where 
Qr is the probability for the tagged relay for not succeeding in message relaying to destination. 

At time t = 0, each relay plays T or plays S: players who take the minority action win, whereas the 
majority loses. 

'This is also the case when disruption caused by mobihty occurs at a fast pace compared to the typical operation time of protocols, 
e.g., the TPC/IP protocol suite. 
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Now, let = Nt + Ns, where Nt (resp. Ns) is the number of agents selecting strategy T (resp. 
S). A tagged relay playing strategy T is member of the minority if Nt < ^, otherwise it loses; silent 
agents win as Ns < N — The probability of receiving a reward R, for an active relay is a function of 
inter-meeting rate, live time, reward mechanism used by the operator and number of active relays. The 
total reward R = r^P^^^J^T, k, s) with PsucciT, k, s), the probability of an active node to receive a 
reward r"* from source s when k nodes are active. 

We denote by g the energy spent by a relay node when it remains active during [0, r]. 

From the sources point of view, performance should be guaranteed above some target level: D^^^^ > 
^Itcc^ where D^^^^ is the probability of successful delivery of a message: 

Nt 
k=l 

and -D*^^^ is the performance threshold imposed by the source. Recall the fundamental trade-off: larger 
successful delivery comes at the price of larger value of Nt and then larger energy cost for active nodes. 
The connection between the network performance and the game depends on the total reward R set by 
the network operator for successful delivery where each r"* is decided by the source s: larger rewards 
causes more nodes to be active which yields a higher delivery probability at the expense of battery 
depletion, and network's lifetime. How to define the reward in order to attain a given performance 
level: we let threshold ^ obey to the relation 

s 

where (7 > is a constant cost of activation per second for each relay. Note that ^ is chosen such 
as to equalize the total energy cost spent by nodes for being active in [0, r] and the expected reward 
obtained for a successful delivery (see Fig. [T])- In the homogeneous case (Pgucc = Psucc Vs), in which 
the relay and sources have similar physical characteristic, e.g. transmission range, mobility patterns, 
energy capacities etc, the last relation becomes 

UsT ■ PsucciT, ^) = gr (2) 

where Ug is the number of sources in the network. 

We now state the assumption required for the function Psucc{T, k, s) 

Assumption A 

The function PI^^JJ^-, k, s) is decreasing in k, i.e., number of active relays. 

Now we can introduce two utility functions for our game, under the assumption that the population 
of sources is homogeneous: P/„^^(T, k, s) = Psucc{T, k) Ms: 
Scenario 1: Zero-sum utility 

f/(T, Nt) = 5^ ■ PLcAT. Nt, s) - gr, U{S, Ns) = -U{T, Nt) 

s 

Scenario 2: Fixed regret utility 

U{T, Nt) = 5^ ■ PLJT, Nt, s) - gr, U{S, Ns) = -a, V Ns 

s 

where in the second case the utility of non-active nodes expresses the regret or satisfaction for not 
participating to message relaying. In particular, we assume a > 0, and we define A'^ such that 

U{T, Nf^) = -a. 
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Fig. 1. Outcome picture of the game as observed by an active node: the intersection corresponds to the tlireshold value for the minority 
being attained by active nodes, i.e., A'^r = ^■ 

The formulation of Scenario 1, requires nodes to estimate Psucc- This can be calculated over time 
by interrogating neighboring nodes and averaging their success rate: this amounts to run a pairwise 
averaging protocol as in [13]. In case we want to avoid the use of gossip mechanisms, we can model 
regret of non-active nodes as a constant negative perceived utility, which corresponds to Scenario 2. 

Remark 2: In minority games with odd number of opponents, different types of equilibria have been 
characterized numerically, e.g., see Challet and Zhang [7], Moro [17]. The minority rule sets the comfort 
level at {Nt, Ns) = (\E', N — and computer simulations show that the participation rate fluctuates 
around ^ in a (\E', — ^) configuration of people that participate or not. For fixed regret, it can be 
anticipated that the comfort level will set around (A^^i ^ ~ ^t)- 

III. Characterization of equilibria 

In this section we provide the exact characterization of the equilibria induced by the game: we 
distinguish pure Nash equilibria and mixed Nash equilibria. 

A. Pure Nash Equilibrium 

The definition of a Nash Equilibrium in pure strategy for our game requires the following two 
conditions to be satisfied: 

U{S, Nt) > U{T, Nt + 1) (3) 

U{S, Nt-1)< U{T, Nt) (4) 

Thus, no player can improve its utility by unilaterally deviating from the equilibrium. 

Proposition 1: Under assumption A, there exists a pure Nash Equilibrium for our game. Moreover 

(i) for scenario 1, there exists a unique NE obtained when exactly ^ among the total population of 
nodes play T. 

(ii) scenario 2, there exists two Nash equilibria which are obtained when the total number of active 
relays is such that: Nt G {N!^, N^ - 1} 

Proof: Scenario 1: First, we show that Nt = ^E' is a pure Nash equilibrium: 

U{S, = U{T, = > U{T, * + 1). 
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which is first condition ([3]). In the same way 

U{S, ^ - 1) = -U{T, ^ - 1) < = U{T, ^) 

and we have second condition dH). 

Second, we show that at the NE: (Nt.Ns) = (\1>,A^ - By contradiction: let iVr > ^' ^ 
U{S,Nt) > U{T,Nt + 1), i.e., © holds. However, 

U{S, Nt-1) = -U{T, Nt-1)>0> U{T, Nt) 

and © fails. Conversely, let Nt < ^ ^ U{S, Nt - 1) < U{T, Nt) so that © holds. But, 

U{S, Nt) = -U{T, Nt) <0< U{T, Nt + 1) 

and dl]) fails. Hence, Nt = is the only possible pure Nash equilibrium. 
Scenario 2: Let A^^r e {iV-^, A^^^ - 1} we have, 

U{S,NT) = -a = U{T,N!}) > U{T,Nt + 1), 
U{S, NT-l) = -a = U{T, N^) < U{T, Nt), 

where equality holds in the first relation for Nt = — 1 and in the second for Nt = N^. We show 
that if Nt ^ {N^^, N!} - 1} then {Nt, N - Nt) then © or (H) fails. In fact, if Nt > we have, 

U{S, Nt) = -a = U{T, N^) > U{T, Nt + l),but: 
U{S, NT-l) = -a = U{T, N^) > U{T, Nt) 

Second, if Nt < N!^ - 1 we have, 

U{S,Nt -l) = -a = U{T,N^) < f/(T, A^r), but: 
U{S, Nt) = -a = U{T, N^) < U{T, Nt + 1) 

Which concludes the proof for the second scenario. ■ 
Remark 3: A crucial design issue is how to relate the parameters of the game to the performance of 
the DTN at the equilibrium. From O, the number of active nodes required to attain -D*^^^ is N^ — 



^°^og(Qi'"''' • Besides, from Proposition [U it must be = A'^''. Replacing in Q we obtain: 

1 

r = gr 



^sPsUCciTj Nrp ) 

Message reward r at the equilibrium is proportional to energy cost g through a positive constant. 

B. Mixed Nash Equilibrium 

Let's consider now that relay nodes maintain a probability distribution over the two actions. Compared 
to pure strategy game, in the mixed strategy game every node can define the strategy by which it will 
be active only for a fraction of the time and stay silent the rest of the time. This kind of equilibrium 
is desirable for an homogeneous population of nodes with similar energy constraints. 

In the mixed strategy game, node i can choose to play action T with probability pi and play S with 
probability (1 — p-i). We let, p = (pi,p2, ■■■,Pn), Pi > 0, \/i the mixed strategy profile of our game. If 
< Pi < 1, Vz then p is a fully mixed strategy profile of the game. A standard companion notation 
that we use for p is (pi, p_i): it denotes the strategy profile of the game when relay i uses strategy pi 
and others use p_j = {pi, ..,Pn)- Let's denote by V^{p,p^i) the utility of node i playing 

action T with probability p. We have the following definition of the mixed strategy Nash Equilibrium: 
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Definition 1: (i) A mixed strategy Nash Equilibrium specifies a mixed strategy p* G [0,1] for each 
player i {where i = 1 . . . N) such that : 

V\pI, -^Pi-i^PhPi+i^ -^P^) > V\ph ■■,Pi-i,Pi,Pi+i^ ■■,P*n) (5) 
for every mixed strategy pi G [0, 1]. 

(ii) We call a Fully mixed Nash Equilibrium a mixed strategy Nash equilibrium p = (pi, ..,pi, ..,pn) 
with Pi ^ {0,1}, Vz. 

From now on we will denote by the term 'mixer' a relay who uses a mixed strategy < pi < 1. 
The following proposition states that any mixed equilibrium p with pi ^ {0, l}Vz, is symmetric, i.e. 
Pi = p Vz. This result comes from the fact that given any pair of mixers, a player is better off if the 
other chooses differently. Moreover, at the equilibrium each player must be indifferent on whether it is 
active or silent. 

Proposition 2: Assume assumption A holds. Let p be the mixed strategy profile of our game s.t pi ^ 
{0, 1}, then at the equilibrium, all mixers must use the same probability p, i.e., Pi = pj V mixer 

Proof: Assume that the set of mixers is not empty and let suppose that there are / relays that select pure 
strategy T and r pure strategy S. Without loss of generality let the strategy profile at the equilibrium : 

P= (pi,...,P7V-;-r,l,.-.,l,0, ...,0) 

Scenario 1: The utility for a mixer relay i writes 

y\P,P^'i} = C^Pi - ^)F{pi,P2, . . . ...,Pn) 

with 

N-l~r N-l-r N-l-r 

F{pi,P2, . . . . . . , 

N-l-r N-l-r N-l-r 

Y.P^Py n i^-Pj")U{T,l + 3) + ...+ l[p,U{T,N-r). 

j"<^{ij,j'} j¥=i 

Note about this function that: 

• F is strictly decreasing by any unilateral increase of pj by node j. This comes from the fact that 
the utility function of an active node is decreasing with the number of active nodes (assumption 
A). 

• For any two mixers j ^ j', pj and pji are indifferently interchangeable variables in F. 
At mixed equilibrium p, = 0VzG{1,...,A^ — / — r}. This implies that: 

F{pi,p2, . . . ,Pi-i,Pi+i, ...,pn) = 0,V mixer i 

. Now suppose that there exists two mixers i and j, s.t. p* ^ p*. Without lost of generality assume that 
p*i < p*, then 

= F{p, ..,pi_i,pi+i, ..,pj, ..,pn) > F{pi, ..,pj^) 

= F{pi, ..,pj_i,pj+i, ..,pn) > 

which is absurd. Thus pi = Pj, V mixers 
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Scenario 2: As in Scenario 1, the utility perceived by a given mixer i when the strategy profile is 
P = {PuP2, ...,Pn) is given by: 

N-l-r N-l-r N-l-r 

+ XI n ~ ^^'"^ + ••• + n ^^'1 

At the equilibrium we have, V mixer z, ^^^"^ = F'{p_i) = 0, where F' has exactly the same shape as 
F with U{T, k) replaced by U{T, k) + a, k E {I + 1, . . . , N — r}. We then use the same reasoning as 
done with function F and conclude that, p* = p*j,'^ mixers ■ 

In the following corollary, we restrain the result of proposition [21 to the special case when every nodes 
act as mixers. 

Corollary 1: Under assumption A, any fully mixed equilibrium p with pi ^ {0, 1}, Vz, is symmetric, 
i.e. Pi = p'ii. 

The following proposition characterize the existence and uniqueness of a fully mixed Nash Equilibrium. 

Proposition 3: Under assumption A, there exists a unique fully mixed Nash Equilibrium p*. More- 
over, p* is solution to: 

• Scenario 1 : 

N 

A{N,p*) = Y,C^S,\p*)'^\l-pY-'U{T,k) = 0. (6) 

k=l 

• Scenario 2 : 

AT 

A'{N,p*) = Y,C^-i{P*)'~\l-pT~'[U{T,k) + a] = 0. 
fc=i 

Proof: Let p the symmetric mixed strategy adopted by every node in the game, pi = p, Vi 
Scenario 1: The utility of one relay i when the strategy profile {pi,p^i) is played is given by: 

N N~l 



= p,Y,ctlVr^\l-P~^)''~'u{T,k) + {l-p,)Y,c^V:^{l-P-^r^'~'u{s,k + l) 

k=l k=0 
N N 

= - P~^)''"'U{T, k) + {l- p.)Xcf_-y_7i(l - p^.f-'U{S, k) 

k=l k=l 
N 

= (2p, - l)5^Cf_-y_-^(l - p^.f-'U{T, k) 

k=l 

N 

Let A{N,p_,) = ECl-^V-Hl-P-^r-'U{T,k) 

k=l 

if A{N,p_i) < 0, Pi = is the best response for player i and conversely, p = 1 is a best response when 
A{N,p_i) > 0. A mixed strategy is obtained when A{N,p^i) = 0. Also, we have 

A{N, 0) = U{T, 1) > > A{N, 1) = U{T, N) 
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thus there exists a mixed symmetric Nash Equilibrium which is unique since A{N,p_i) is strictly 
decreasing with p. The mixed equilibrium is thus characterized by equation Q. 

N 
k=l 

Scenario 2: The utility of one relay i when the strategy profile (phP^i) is played is given by: 

N 

fc=i 

At the Nash equilibrium we have, V player i, ^^q^, = A'{N,p*) = with 

N 
k=l 

Since a is a fixed positive constant, A'{N,p*) has the same properties as A{N,p*) from the proof 
of scenario 1. Then we easily conclude that, p* is unique and characterized by : 

N 

A'{N,p*) = Y^C^s^\p*)>'-\l-p*f-'mT,k)+a] = 0. 

k=l 



C. Equilibrium with mixers and non-mixers 

We study here the existence of equilibrium when the population of agents is composed of pure 
strategy players: active or non-active, as well as mixers. In this case, a non-pure Nash equilibrium 
can be represented by the triplet (/, r,p*), where /, r G {0, 1, ... , A^} denote respectively the number of 
agents choosing pure strategy T or S, and p* E (0, 1) the probability with which the remaining N — l — r 
mixers choose strategy T. Moreover, we denote by vt {I, r,p) (resp. vs{l,r,p)) the expected payoff to a 
player choosing r(resp. S). The expressions of VT{l,r,p) and vs{l,r,p) write as follow: 

vt{1, r,p)= C'r'^Vll - pf-'^'-^'UiT, I + k) (7) 

fc=0 

and 

vs{l,r,p) = - Cj:-'~V{l-pf^'^''~'U{T,l + k) (8) 

k=0 

Proposition 4: Using the previous notations, a strategy profile of type (/, r,p*) is a Nash equilibrium 
with at least one mixer if and only if: 

VT{l + l,r,p*) = vs{l,r + l,p*) (9) 

We prove that this result holds for Zero-sum utility and fixed regret utility for non-active nodes(resp. 
Scenario 1 and Scenario 1) 

Proof: The condition ^ describes that a mixer is indifferent whether it chooses a pure strategy T or 

S. This is a necessary condition for the strategy profile {l,r,p*) to be a Nash equilibrium. 

In order to show sufficiency, we need to show that pure strategy players as well, cannot improve their 
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expected utility through unilateral deviation from the equilibrium profile. Without loss of generality, 
suppose that there is at least one player using pure strategy T, we have 

VT{l,r,p*) > Vt{1 + l,r,p*) = vs{l,r + l,p*) 

> vsil-l,r + l,p*) 

> p*VT{l,r,p*) + {l-p*)vs{l - l,r + l,p*) 

This last relation, states that an active user cannot improve its expected utility by unilaterally deviating 
from the strategy profile (/, r,p*) using any strategy p* E [0, 1), given relation IQ. As done for Scenario 
1, in Scenario 1, we have, vs{l,r + l,p*) = —a, let vs{l + l,r,p*) = —a then: 

VT{l,r,p*) > vt{1 + = -a> vs{l - l,r + 

> p*VT{l,r,p*) + {l-p*)vs{l - l,r + 

moreover, 

vs{l + l,r - < vt{1 + l,r,p*) = -a = vs{l,r,p*). 
This completes the proof. ■ 

Discussion on existence of {l,r,p*) type equilibria 

It is possible to isolate several cases where the relation © that characterizes a Nash Equilibrium of 
type {l,r,p*), cannot be satisfied. 

We denote by, p = 0"''(resp. p = 1~) the mixed strategy infinitely close to zero(resp. to one), with which 
at least one mixer select to be active. Since, VT{l,r,p*) is strictly decreasing with / and p*, we have, 

Vt{1 + l,r,p*) = vs{l,r + l,p*) 

^ \vTil + l,r,l-) < -VTil,r + l,l-), 

(1) If / > then there is no Nash equilibrium of the desired type. Indeed, / > then vt{1, r+l, 0^) < 
and 

VT{l + l,r,0+) < < -VT{l,r + 1,0+). 

Then there is no possible Nash Equilibrium according to relation ([9]). 

(2) If / + r + 1 > — 1, then there is no Nash equilibrium. We already have / <\1', let/ + r + l = A^ 
then, 

vt{1 + l,r, p) = Ci > 0^ p and 

vs{l,r + l,p) = C2 >0'ip. 

Since vt is decreasing with /, we have, < Ci < C2 which contradicts relation 
A Nash Equilibrium of type {l,r,p*) exists then only for / < ^ and for / + r < N — 2, thus there 
are exactly ^(A^ — 2) — '^^'^^^^ Nash equilibria. In the following proposition we go further and decline 
some properties of the symmetric mixed strategy p* at the equilibrium. 

Proposition 5: The mixed strategy p* at the equilibrium increases as r increase and reversely de- 
creases as / increase. 

Proof: For a fixed number / of nodes playing pure strategy T, the utility of a mixer when there are less 
nodes playing pure strategy S, decreases faster than when there are more nodes playing pure strategy 
S. For example we have, 

dvT{l + 1,0, p) ^ dvT{l + 1,1, p) 
dp dp 
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Similarly, we will have 

dvT{l,l,p) ^ dvT{l,2,p) 
dp dp 

Since, vt{1 + 1,0,0+) = vt{1 + 1,1,0+) and wt(/,1,0+) = vt{1, 2,0+) then if pl,p* are such that 
vt{1 + 1, 0,pI) = -vt{1, i,Pl) and vt{1 + 1, 1,^2) = -'Vt{1, 2,pl), it follows that p\ < p^. 
The same reasoning holds for every k < k' andp*,p2 s.t. vt{1 + l,A;,p*) = —VT{l,k + I, pi) and 
vt{1 + l,k',p*2) = -vril.k' + l,pl) then p\ < p^. 

We apply a similar reasoning reversely and conclude that for a fixed number r of nodes playing pure 

strategy S, for every k < k' and pl,P2 s.t. VT{k + l,r,p^) = —VT{k,r + l,pl) and 

vxik' + l,r,p2) = -VT{k',r + 1,^2) then pi > p^. ■ 

Summary on characterization of equilibria/ Throughout this section we have characterized the 
following different equilibria : Under assumption A we have 

1. pure equilibrium : We shown that for the Zero-sum utility there exists a unique pure N.E. that sets 
at exactly \E' active relay nodes. For the Fixed regret utility scenario, there exists two possible N.E. 
for a number of active nodes E {N^, — !}• 

2. fully mixed equilibrium : For both scenarios we shown that any fully mixed equilibrium p with 
Pi ^ {0, 1}, Vz, is symmetric. Moreover, the mixed N.E. of our game is unique and characterized 

N 

by : A{N,p*) = J2CkSi{p*)'''^il - p*)^-''U{T,k) = for the zero-sum utility scenario and 

k=i 

N 

characterized by A'{N,p*) = J2CkSi{p*)'"~\l - p*)^-''[U{T, k) + a] = for the fixed regret 

k=l 

scenario. 

3. equilibrium with mixers and non-mixers: The last characterized type of equilibrium is related to 
a population of relays composed of mixers and non-mixers. Here we shown that such type of 
equilibrium is characterized by a specific relation, namely relation (|9l). Moreover, we established 
that a Nash Equilibrium of this type exists only for I < ^ and for / + r < — 2, thus there are 
exactly ^'(iV - 2) - li^ill Nash equilibria. 

IV. The multi-class case 

In the first part of the paper we adhered to a common simplifying assumption in many earlier 
works on modeling performances of DTNs, i.e., we assumed that DTN nodes have all similar physical 
characteristics, e.g. transmission range, mobility patterns, energy capacities etc., i.e., the DTN is homo- 
geneous. In this section we will design a model to allow a fairness between mobiles relays based on their 
capacities. We extend our results to DTNs with several classes of nodes. In fact, DTNs nodes may belong 
to different categories, e.g., mobile, laptop, PDA and/or have related communication/energy-autonomy 
features depending on transmission range, mobility, memory, energy capacity and active radio interface 
such as WiFi and Bluetooth. A DTN with different types of nodes is classified as heterogeneous [6], 
[8]. 

To this respect, we assume nodes to fall into classes according to their physical characteristics: the 
aspect we focus on is the heterogeneity energy budget/consumption of nodes. For example, devices 
using Bluetooth radio instead of WiFi consume between 10 to 50 times less power [11]. More precisely, 
WiFi interface's power consumption in an active data transfer state is of the order of 890 mW, compared 
to only 120 mW for Bluetooth due to a limited range and a simpler radio architecture. For small devices 
such as cell-phones and PDAs, with limited power budgets, the power consumption of a WiFi radio 
represents a significant proportion of the overall system power [19][20][1]. 

The extension of the game is done by devising a class-dependent reward mechanism. In fact, nodes 
of classes with larger battery capacity might choose to be more active to collect the reward, while 
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nodes of classes with a limited battery capacity may participate less in order to save energy. As before, 
the sources wish to satisfy performance requirements in a way that conserve energy consumption and 
achieve consumption fairness. 

A. The model 

Heterogeneous DTNs considered in this section are composed of M classes of relay nodes: class j, 
1 < i < M, contains Nj nodes with inter-meeting intensity Aj > 0, and = Nj. We let each class 
i has its own threshold ^! j that defines the majority /minority of nodes from class j. We will often refer 
to the case M = 2 for the sake of clarity; results shown later easily extend to hold in general unless 
otherwise stated. 

The energy consumed by nodes, when active, i.e., playing T, has a large impact on the lifetime of the 
battery-operated mobile nodes due to limited energy budget in DTNs. This depletion of energy depends 
not only on the wireless technology used by each class's nodes but also on the type of these nodes (the 
rate at which energy is consumed by PDA-based phones is very high compared to laptops, thus, these 
devices can quickly drain their own batteries). We let gj the energy cost for a relay node of class j 
when it remains active during a unit of time and we consider the inter-meeting intensity is the same 
for all classes, i.e. Xj = A, VI < j < mH For the case M = 2 we assume that gi > §2 such that nodes 
of class 1 has higher energy requirements than nodes from class 2 to be active. 

The utility function for an active node of class j is: 

Uj{T, Nt) = rjP^ucciT, Nt) - gjT 
while the utility for a silent node is: 

Uj{S, Nt) = -rjP,^,,{T, Nt) + gjT 
The thresholds \&j as previously defined satisfies the following relation: 

VI < J < M : rjP,^,,{T, = g^r (10) 

B. Characterizing the equilibria 

Proposition 6: In the multi-class framework: There exists a unique pure NE attained when (^I^j)j(={i,...,Af } 
nodes among the total population select to be active for relays of each class j. 

Proof: The Nash Equilibrium is obtained when the following two conditions are satisfied: 

Vi<,<M./ UAS.Nt) > U,{T,Nt + 1) 

^-^-^^^■\U,iS,NT-l)< U,{T,Nt) ^^^^ 
Assume that for any class j exactly ^! j nodes are active, then we have: 

U,{S, v[/,.) = U^iT, ^^) = > [/,(T, vl>^- + 1), 

in the same way we have: 

U,{S,^, - 1) = -U,{T,^^ - 1) < = U,{T,^,), 

then we have the conditions in (fTTI) satisfied. 

We now show that there are no other pure Nash equilibria. Let, for a class j, ^! ^ ^ without loss 
of generality, let > then U{S, '^'^) > U(T, + 1): first condition of ^TT^. However, 

U{S, ^'j - 1) = -U{T, ^'j-l)>0> U{T, *^.) 

^Future extensions of the model will account for heterogeneity in the inter-meeting intensities [9], [6]. 
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and the second relation is not satisfied. Continuing with the same reasoning used in the proof of 
proposition ([T]), we obtain that at the equilibrium there are exactly active nodes hence the proof. ■ 

As in the case of homogeneous DTNs, we can extend the result to mixed strategies. 

Proposition 7: Let the fully mixed strategy profile of our game in the multi-class framework p = 
{pii, ...,Pn-^i, ...,Pij, ...pN.j, ...,PiM, ■■■,Pnmm)- At the equilibrium, all players of the same class must 
use the same fully mixed strategy: Pij = pj, Vi; VI < j < M; the result holds both Scenarios 1 and 2. 

Proof: We denote by {pij,p^i) the fully mixed strategy profile of the game when relay i of class j 
uses strategy pij and others use p_i = {pu, ...,pn^i, ...,pij, ...,pi_ij,pi+ij, ...^PN^j, ...,Pim, ■■■,Pnmm) 
Scenario 1: The utility perceived by a given player i of class j when the strategy profile is P is given 
by: 

U;{p) = {2p,-l)F,{p_,) 

with 

Fi= Ili^-Pk)UjiT,l) + Y,Pkm n -Pk'm)Uj{T, 2) PkmPk'm R (1 - Pk"m)Uj{T , 3) 

k^i k^i k'^{i,k} k,k'=/=i k"^{i,k,k'} 

+ ...+ UPkmUj{T,N) 
k^i 

VI < m < M. Note about this function that: 

• Fi is strictly decreasing by any unilateral increase of pkm. by player k of class m. 

• For any two k ^ k' of the same class m, the mixed strategies Pkm,Pk'm are indifferently inter- 
changeable variables in Fi. 

dUUp) 

At the equilibrium we have, V player i, VI < j < M, = 0. This implies that : Fi = 0. Moreover, 

the strategy profile p = {pl^, ...,Pn^i, ■■■,P*N^j, ■■■,P*im^ ■■■^P*Nmm) is a Nash equilibrium if no user 

can increase its utility by any unilateral deviation. Now suppose that there exists i, k of class j, such 
that, p*,j 7^ Ply Without lost of generality assume that p*j < pl^ we have, 

U — ri{...,Pij, ■■■,Pkji ■■■yPN*ji ■■■^PNmM) 

> -t^iy-'-^Plji ■■■,Pi-lj,Pi+lj, ■■■yPiji ■■■^PN'j, ■■■iPNmM) 

TTT / * * * * \ 

— -t^iy-'-^Plji ■■■,Pk-lj^Pk+ljy ■■■iPN*j, ■■■^PNmM) 

> 

which is absurd. Thus p*j = plj, V of class j. 

Scenario 2: As in scenario 1, the utility perceived by a given player i when the strategy profile is 
P = (Pi,P2, ■■■,Pn) is given by: 

Uj{P) = Pij * Fi{p_i) - a{l - Pij) - Pkm) + ^Pkm n ~ ^'^'"^^ 

k^i k^i k'^{i,k} 

+ ^ PkmPk'm Yi ~ + ••• + JJ^-m] 

k,k'^i k"^{i,k,k'} k^i 

dU^CP) 

At the equilibrium we have, V player i, = F^p^i) = 0, where F/ has exactly the same shape as 

Fi with Uj{T, k) replaced by Uj{T, k) + a, k E {1, . . . , A^}. We then use the same reasoning as done 
with function Fi and conclude that, p*j = Plj-, V i, A; of class j. ■ 
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Let pj the symmetric mixed strategy adopted by every node of class j, Pij = pj, For reasons 

of clarity, we characterize the mixed strategy p* in a two-class scenario without any loss of generality 
(M = 2). 

Proposition 8: There exists a unique fully mixed Nash equilibrium (pi, pi) for the multi-class case. 
Moreover it is the solution of, Ai{N,pl,p2) = A2{N,pl,p2) = where: 

A^i-l N2 

fcl=0fc2=0 

and 

Ni N2~l 

A,{N,pi,p;) = -pD^^-'^-'cf -pD^^-'^)f/2(r, + h) 

A:i=0fc2=0 

Moreover, 

(i) if ^ = ^ then we have pi = p2- 

(ii) if ^ < ^ then gi > g2 ^ pi < P2- As a consequence we have \E'i < ^2- 

Proof: Scenario 1 : The utility of one relay i of class 1 when the strategy profile {pn.p^i) is played 
is given by: 

VI{pa,P-^) = P^Y1 -^i)'''"''"'02'(l -^^2)^^-'=^)t/i(T,fci + k2) 

fcl=0fc2=0 

Ni-l N2 

H^-Pi) E -P2)^^-'=^)f/l(5, fci + k2) 

fci=0 fc2=0 

= {2pi-l)*Ai{N,pi,p2) 
In the same way we can write the utility of a relay i of class 2 as: 

Vi{p,2,p.,) = (2p,-l)*A2(iV,Pl,P2) 

where /li(A^,pi,p2), ^2(^,^1,^2) are defined as follows: 

Afi-l N2 

A,{N,p,,p2) = -Pl)'''"''"'02'(l A^i + k2), 

fci=0fc2=0 

and 

Ni N2-I 

A2{N,p,,p2) = E E (^^?"'P2^(1 -P2)^^-''^-^C,>t^(l -pi)^-'=^)^2(r,fci + k2). 
fci=0fc2=0 

As motivated in the proof of proposition ([3]), a mixed Nash equilibrium(p*,p2) is obtained here when 

A^{N,pl,p;) = A2{N,pl,p;) = 0. (12) 
Scenario 2: The utility of an active user of Class 1 is given by: 

A^i-l N2 

= p^Y^Il^^l'^'pi'^^ -Pif'-''-'c^M'i^-P2f'-'nui{T,k, + k2) - (1 

fcl=0fc2=0 

Ni-1 N2 

= P^H - Pif^^'^~'C^2'p'2'{l-p2r^-'W,{T, kr + k2) + a]-a 

ki=0k2=0 
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and utility of user i from Class 2 writes 

fcl=0fc2=0 

By replacing, Uj{T, ki + /C2) by Uj{T, ki + k2) + a in the first case we obtain the same conclusions. 
The proof is thus similar to the Scenario 1. Hence the existence of a mixed Nash equilibrium. Now let 

N1-2N2-2 

C{i) =^Y^ P{Ki = kuK2 = k2)nPsucc{T, ki + k2 + ei + 1) 

fei=0 k2=0 

for user i, where = 1 if user i is active and Cj = otherwise. We can thus rewrite the expressions of 

Ai{N,pl,p2) and A2{N,pl,pl) as follows: 

A^{N,pU,1) = np2C{l) - ri(l -P2)C(0) - gr (13) 
A2{N,pI,pI) = r2PiC{l) - r2(l -pi)C(O) - gr (14) 

It follows that, Ai{N,pI,pI) = A2{N,pl,p*) = ^ 

p,Cil)-{l-p2)C{0) = ^ (15) 

n 

P,C{l)-il-p,)C{0) = ^ (16) 

'''2 

letting ^ = ^ we have, pi = p2- This completes the proof of {%). 
Now, let 7i = ^,72 = ^ then from and ([HI) we have: 

{p2-pi)C{l) + {p2-pi)C{Q) = 71-72 
^{p2-Vi){C{Q) + C{l)) = 71-72 

Since, C(0) > > d, then, 7i > 72 ^ P2 > Pi • This tells that in order to have fewer nodes active 
in class 1 we should allocate smaller reward. However, if we come back to the definition of \E'i we 
have, 

nPsucciT, ^1) - ^ir = ^ PsucciT, ^1) = ^ > ^ 

ri r2 

^ r2Psucc{T, ^1) > g2r ^ Ps«cc(T, ^1) > Psucc{T, ^2) 

Under assumption A we have, ^2 > ^I/i. Hence the proof of (ii). ■ 
The last result allow us to extend the the minority game with only one threshold to a minority game with 
several thresholds allowing to control the average number of active users in each class at equilibrium. 
Due to the complexity of the expressions, it's in general difficult to obtain an explicit solution of (fT2l) . 
We are able however to obtain numerical solution as shown in Fig. [21 

V. Distributed reinforcement learning algorithm 

In this section we introduce a distributed reinforcement learning algorithm: it permits to relays to 
adjust strategies they play over time in the framework of the DTN MG designed in section [III The 
analysis of convergence of the algorithm relies on a stochastic model that gives rise to an associated 
continuous time deterministic dynamic system. It will be proved that this process converges almost 
surely towards a stationary state which is characterized as e-approximate Nash equilibrium. 

''This comes from the fact that the more number of active nodes, the less is the probability of obtaining the reward for a tagged node. 
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Fig. 2. The mixed Nash equilibrium: multi-class, where gl = 0.8 x W''^,g2 = 0.5 x 10"", r2 = 0.15, A = 0.03, r = 100, iVi = 
20, N2 = 20 

In DTNs, limited computational power and low energy budget of relays requires adaptive and energy- 
efficient mechanisms letting relays adapt to operating conditions at low cost. The learning algorithm 
proposed here matches this reality of DTNs since, as we shall see, it has the following attractive features: 

• It is genuinely distributed: strategy updating decision is local to relays; 

• It depends uniquely on the realized payoffs: nodes utilize local observations to estimate their own 
payoffs; 

• It uses simple behavioral rule in the form of logit rule. 

We assume that each relay node i has a prior perception Xi of the payoff performance for each action 
(To be active, or not), and makes a decision based on this piece of information using a random choice 
rule. The payoff of the chosen action is then observed and is used to update the perception for that 
particular action. This procedure is repeated round after round, each round of duration r generating a 
discrete time stochastic process which is the learning process. 

For notation's sake, denote A = {T, S} the set of pure strategies, and Aj is the set of mixed strategies 
for player i with i E {1,...,A^}. Let V^{.) the payoff function for player i. The algorithm works in 
rounds of duration r, at round k, each relay node i takes an action according to a mixed strategy 
7rf = ai{x'i) E Aj. The fully mixed strategy is generated according to the vector = {x'l;^)aeA which 
represents its perceptions about the payoffs of the available pure strategies. In particular, relay node i's 
fully mixed strategies are mapped from the perceptions based on the logit rule: 



where /3 is commonly called the temperature of the logit. The temperature has a smoothing effect: when 
/3 — )■ it leads to the uniform choice of strategies, while for /3 — )• oo the probability concentrates on 
the pure strategy with the largest perception. We assume throughout that is strictly positive for all 
aE A. 

At round k, the perceptions x^^ will determine the mixed strategies 7rf = crj(xf ) that are used by each 
player i to choose at random action T (to be active) or S (to be silent). Then each player estimates his 
own payoff , with no information about the actions or the payoffs of the other players, and uses this 
value (£tf) to update its perceptions as: 



Ql3xiT _|_ ^I3xis 



(17) 




(18) 
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Algorithm 1 Distributed reinforcement Learning Algorithm 
1: input: k = 1, each relay node i chooses its action (T or S) according to distribution pi and set its initial perception 
value ~ 0. 

2: whUe max{\x^^^ - ^ixl, ^ ~ x'^sD > 

3: Each relay node i updates its fully mixed strategy profile at iteration k according to (flTl l. 
4: Relay node i selects its actions using its updated fully mixed strategy profile. 
5: Relay node i estimates its payoff uj^. 

6: Relay node i updates its perception value according to iT% . 
7: k^ k + 1 
8: end while 



where G (0,1) is a sequence of averaging factors that satisfy Xlfe 7'^ = ^ ^^'^ Yliki'^^T < 
(examples of such factor are 7^' = ^ or 7^^ = ^ player only changes the perception of the 

strategy just used in the current round and keeps other perceptions unchanged. Algorithm ([T]) summarizes 
the learning process. The discrete time stochastic process expressed in (fTSi) represents the evolution of 
relay node perceptions and can be written in the following equivalent form: 



- < = rHa - xfj, Vz G {1, .., iV}, a G A (19) 

with 



<-<i (20) 

x7„ otherwise. 



In what follows we will prove that this algorithm can attain a steady state for the coordination process 
among players. Also, the information it needs to operate is minimal. 

A. Convergence of the Learning Process 

Based on the theory of stochastic algorithms, the asymptotic behavior of ( fT9l ) can be analyzed through 
the corresponding continuous dynamics [4]: 

— = E\w\x) — x, (21) 
dt 

where x = {xia,yi G {1, .., A^}, a E A) and w = {wia,yi G {1, .., A^}, a G A). 

Let us make equation (|2TI) more explicit by defining the mapping from the perceptions x to the 
expected payoff of user i choosing action a as Gia{x) = E{V^\x^ ai = a). 

Proposition 9: The continuous dynamics (|2TI) may be expressed as 

= Cria{Gia{x) - Xia) (22) 

dt 

Proof: Using the definition of the vector w, the expected value E{w\x) can be computed by conditioning 
on player i's action as: 



E{Wia\Xia) = 7riaU{a,7r-i) + (1 - nia)Xia 

= (JiaGiaix) + (1 - aia)Xia (23) 

which with (EB yields ■ 
This can be interpreted as follows: when the difference between the expected payoff and the perception 

value is large, the perception value, from (fT9l) , will be updated with a large expected value w^^ — x\^ 

and this difference will be reduced. 

In the following theorem, we prove that the learning process admits a contraction structure with a 

proper choice of the temperature /3 . 
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Theorem 1: Under the logit decision rule (flTI) . if the temperature satisfies /3 < then the mapping 
from the perceptions to the expected payoffs G{x) = [Gia{x),\/i G {1, ..,N},a G A)] is a maximum- 
norm contraction. 

Proof: Recall that Gia{x) is the expected payoff of relay node i choosing action a given the perceptions 
for all players x. Assume the chosen action is to be active (T), then Girix) can be written as: 

N 
j=0 

Now consider the difference GiT{xi) — Gixixi) given two arbitrary perceptions Xi and Xj of a relay 
node i : 

Af-l 

\G,T{x^) - G,t{x,)\ = \a,T{x^) J2 Cl-^\cT,T{x,)y-\l - a,T{x^)f-'U{T,j) 

i=i 

< W^Axi) n,r {Gf'\a,T{xi)yil - cT,T{x.)f-^) 

j=0 

i=o 

< \aiT{xi)nsr - aiT[xi)nsr\ 

< nsr\aiT{xi) - aiT{xi)\ 

We know that aia{xi) is continuously differentiable, then by the mean value theorem, there exists 
Xia = ^ixia — x\a) with < (5 < 1 such that: 

aiT{xi) - airixi) = — ^ - ^ 

= ^[ (Y^ g/3x..)2 {x^t-x^t)- 22 f^Jy 

= ^[GrixiT - Xir) - ^ (^^ia' - ^ia')] 

a'eA,a'jiT 



where Gt = (f T^e-^-^^)^ ^""^ = (E "^eVL)^ - can easily observe Gt = Ea'eA,a'^a ^a' 

and 2Ga<l. Then: " 

Wirixi) - aiTixi)\ < f3GT\xiT - x^tI + ^ /3C„' {x^^' - 1 

< /3||a;-£||oo. (24) 

Combining dUl) and dill), we obtain 

IGiTfa;) - GiT(x)\ < /3nsr\\x - xlU 



18 



We obtain the same result when player i chooses to be silent (S). Observing that since by the minority 
game rule GiT{ )Gis{-) < 0, then if /3 < indeed G{x) is a maximum-norm contraction. ■ 

Based on the property of contraction mapping, there exists a fixed point x* such that G(x*) = x*. In 
the following theorem we show that the distributed learning algorithm also converges to the same limit 
point X*. 

Theorem 2: If G{x) is a 1 1.| |oo-contraction, its unique fixed point x* is a global attractor for the 
adaptive dynamics (|22|) . and the learning process (fT9l) converges almost surely towards x*. Moreover 
the limit point x* is globally asymptotically stable. 

Proof: Since G{x) is a ||.||oo-contraction, it admits a unique fixed point x*. According to general results 
on stochastic algorithms the rest points of the continuous dynamic (|22|) are natural candidates to be 
limit point for the stochastic process (fT9l) . All together with ([4], corollary 6.6), we have the almost 
sure convergence of (fT9l ). given that we exhibit a strict Lyaponuv function 0. 

Now let (j){x) = \ \xia-x*\\oo, then (j){x*) = 0, > 0, Vx 7^ x*. Let i e {1, A^}, a G A be such that 
= \xia — x*J. If Xia > x*^, then (/){x) = Xia — x*^. Since Gia{x) is a maximum norm contraction, 
there exist a Lipschitz constant ^ such that Gia{x) — Gia{x*) < i{xia — x*^), and Gia{x*) = x*^. All 
together combined with equation (|22l) . we can write: 



d(l}{x) _ d{xia - x*J _ dxjg 
dt dt dt 

(^iai^Giai^X^ Xia) '^ia(^ia('^) Gifi(x ) -\- X^^ ^ia) 
< (^ia^iXia - X*J + X*^- Xia = -{1- (TiaO^Pi^) < 0, Vx 7^ X*. 

and a similar argument for the case Xj^ < x*^ also shows that ^^^^ < 0,Vx 7^ x*. Thus the function 
(j){x) is a strict Lyaponuv function and x* is globally asymptotically stable, hence the proof. ■ 

B. Approximate Nash Equilibrium 

From lemma Q and theorem ©, we have: 

^^^(x*) = E{V'\x*,ai = a) = x*„. 

This is a property of the equilibrium (x*) of the distributed learning algorithm: its value x*^ is an 
accurate estimation of the expected payoff in the equilibrium. Moreover we show that the fully mixed 
strategy 

is an approximate Nash equilibrium. 

Proposition 10: Under the Logit decision rule (flTl) . the fully mixed strategy p* = cr*(x*) at the 
equilibrium x* is a e-approximate Nash equilibrium for our game (proposition O with 
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Proof: A well-known characterization of the logit probabilities gives: 

rg max y^aiaE{V'\x\ai = a)-^y^^aia{ln{aia)-l) 



'^i = lo'iT,<^iSl 



g/3_E(y'|a::*,ai=T) _|_ QpE{V^\x* ,ai=S) ^I3x*^ _|_ g/Sx*,, ' 

and since ([5], pp.93) 

msixy^aiaE{V'-\x*,ai = a) - ^y^^o-ia{ln{(ria) - 1) < max cTia^(V^1a;*, = a) 

aeA aeA aGA 

then, we have: 

''^^a*aE{V^\x* , tti = a) > max crja-E(y*|a;*, = a) — e 

aSA ' a€A 

where e = maxie{i...Af}{--i J2aGA (^ia{ln{aia) - 1)}. 

Hence the fully mixed strategy p* = a*{x*) in the equilibrium x* is a e- approximate Nash equilibrium. 

■ 

Observe that the parameter e illustrates the effect of the temperature f3. A larger e (smaller /3) means 
worse learning performance. 

VI. Application : Two-hops routing and exponential inter-contacts 

In the previous sections we presented under a general context of DTN how a controlled minority 
game can be used to induce a stable cooperative behavior among the relays without actual cooperation. 
So far we assumed that the inter-contact time between nodes follows a random distribution and relay 
nodes can adopt any relaying policy. 

In this section and for the numerical analysis, we will assume that relay nodes use the two hop routing 
scheme, in which any mobile that receives a copy of the packet from the source can only forward it to 
the destination. The time between subsequent contacts of a node with any other node in the network is 
now assumed to follow an exponential distribution with parameter A > 0. The validity of this model for 
synthetic mobility models has also been discussed in [2]. In particular, regarding the rewarding policy 
adopted by the source nodes, we assume that upon successful delivery of a message, the relay node 
receives a positive reward R if and only if it is the first one to deliver the message to the corresponding 
destination. 

Under those assumptions, we can obtain the expressions of different quantities: in particular the prob- 
ability that an active node relays a copy of a received packet to destination within time r is 1 — Qr 
where the expression of Qr is given by [3]: 

Qr = {l + XT)e-^\ (25) 

Now, the probability of successful delivery of the message for an active node is: 

^T-l n - D ^k-l/^Nr-k 

PsuUT,Nr) = {l-Qr)Y.Ck^i \ ^ 

k=l 

- -it- 

where = (^) , such that each node seeks to be the first to deliver a given message to its destination. 
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Fig. 3. Learning the mixed strategy: homogeneous case. 



VII. Numerical Results 

In this section, we provide a numerical analysis of the performance achieved by DTN nodes following 
the distributed reinforcement learning mechanism proposed in section |Vl First, we focus on the achieved 
performance in a homogeneous network where all nodes have the same energy constraint {g). Second, 
we examine the performance of our algorithm in a multi-class framework (heterogeneous DTN), where 
we consider the existence of two classes of nodes. Then we will verify the intuitive result obtained 
in proposition ([8]) which states that by allocating smaller reward to a class, fewer nodes of this class 
will choose to be active. The results presented here take into account the utility functions defined in 
Scenario 1. The parameters A = 0.03, r = 100 are used through out the numerical analysis. 

Homogeneous DTN: The performance of our learning algorithm in the homogeneous case is shown 
in Fig. [3l In this case we consider g = 6.6 x 10~^, = 40. We set the sequence 7^"' = ^ for all iterations 
k, and the temperature /3 — )■ 00, note that this choice of /3 is a good deal since it allows our algorithm 
to attain the Nash equilibrium (proposition (flOl)). 

In Fig. [Sja) we observe that the probability to be active for a node i ipi,yi E {l...A^}) converges to 
the symmetric equilibrium {p* = 0.35) which is the solution of Moreover, it is interesting to notice 
that the average number of active nodes at the equilibrium approaches the value of (\1/ = 15) where \1/ 
defines the comfort level of the minority game in pure strategy (Fig. Sb)). Such behavior is, in fact, a 
convergence to the strictly mixed Nash equilibrium discussed in proposition ([3]). The same observation 
is recorded in Fig. [3tc,d) where a smaller energy consumption parameter g = 3.3 x lO""^ yields a larger 
activation rate which can be noticed in the convergence of pi to the mixed Nash equilibrium (p* = 0.75). 
As a result, there is in average more active nodes (\& = 30) at the equilibrium. 
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Heterogeneous DTN: The performance of the learning algorithm in the heterogeneous DTN is 
investigated in two cases, symmetric (i.e. when = f|) and asymmetric ^ We consider first 
the symmetric case. We consider gi = 0.8 x 10~'^,g2 = 0.5 x 10"^, A^^i = 20, A^2 = 20 then setting 
r2 = 0.15 we obtain ri = 0.24. In Fig. ©(a) we observe that the probability of being active of nodes of 
both classes (pi,P2) converges to the symmetric Nash equilibrium discussed in proposition ([8]), and the 
value it converges to (pi = P2 = 0.78) is the solution of the equation {Ai{N,pl,p2) = A2{N,pl,p2) = 0) 
as shown in Fig©. The average number of active nodes, depicted in Fig ©(b), converges to \1/ = 30 
that satisfies the relation (flOl ). 

In Fig©, we depict the asymmetric case, when gi > g2 and ri < In Fig(l5])(a,c,e) we observe 
that (p2 > pi), in other words, the nodes with high energy constraint (class l)are less active, thus by 
allocating smaller reward (ri), fewer nodes of class 1 are active. Notice in Fig(l5])(b,d,f) that the average 
number of active nodes \E'i < Nt < "^2- 

VIII. Conclusions 

Coordination of mobiles which are part of a DTN is a difficult task due to lack of permanent 
connectivity. Operations in DTNs, in fact, do not support the usage of timely feedback to enforce 
cooperative schemes which may be implemented on mobile nodes. Nevertheless, coordination is worth 
indeed in order to attain efficient usage of resources. Moreover, selfish behavior and activation control 
becomes core when owners of relay devices may need incentive to spend memory and battery. 

To this respect, our paper provides a novel mechanism designed using the theory of Minority Games 
(MGs). MGs are non-cooperative games which apply to contexts where the payoff of players decreases 
with the number of those who compete. We could design a reward mechanism for two hop routing 
protocols that runs fully distributed and with no need for any dedicated coordination protocol. I.e., the 
source controls how many nodes to activate in order to attain a target message delivery probability. It 
does so by setting the reward for nodes who deliver first and such in a way to avoid overprovisioning 
of activated relays. Finally, we developed a distributed stochastic learning algorithm able to converge 
to the optimal solution. 

Future works will investigate how to extend the models and the properties of convergence of our 
algorithm to other types of networks such as cognitive radios and peer-to-peer networks. 
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